Skip to content

Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs#3

Merged
grancier merged 3 commits into
masterfrom
chore/dependabot-triage
May 25, 2026
Merged

Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs#3
grancier merged 3 commits into
masterfrom
chore/dependabot-triage

Conversation

@grancier

@grancier grancier commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Three related currency / hygiene updates in one PR. Each commit is independently reviewable.

Commit 1 — Triage all 27 dependabot alerts to zero

Drives the open advisory count from 27 dependabot alerts (1 critical, 8 high, 17 moderate, 1 low) down to zero; npm audit from 14 findings to 0. Five vulnerable packages collapse to a small set of fixes:

Fix Impact
Drop @langchain/community Never imported in src/. Removing kills the ibm-cloud-sdk-core → axios → follow-redirects chain (11 axios CVEs incl. prototype-pollution highs) plus transitive ws and a redundant langsmith copy.
Replace @xenova/transformers (2.17.2, abandoned) with @huggingface/transformers (4.2.0) xenova was frozen at 2.17.2 and transitively pinned to protobufjs@6.11.4 — the CVSS-9.8 Arbitrary Code Execution advisory plus 8 others. HF ships onnxruntime-web@1.26.x. One-line import swappipeline() and FeatureExtractionPipeline are signature-identical.
Bump direct uuid to ^13.0.2 Clears the buffer-bounds advisory on uuidv5 (hot path: chunk IDs and Qdrant point IDs).
npm overrides for transitive uuid, langsmith, ws Holds fixes when @langchain/core / openai would otherwise pull older ranges.

Commit 2 — Bump vLLM build docs to v0.21.0

v0.19.0 → v0.21.0 introduces three breaking deltas worth flagging in vllm-setup.md:

  • C++20 build requirement (v0.21.0). gcc-10+ required; Ubuntu 22.04 (gcc-11) and 24.04 (gcc-13) defaults both fine.
  • PyTorch ≥ 2.11 (v0.20.0). Nightly ROCm index resolves automatically; wording-only change to note the floor.
  • HuggingFace transformers ≥ 5 (v0.20.0). Picked up transitively, no manual step.

The RDNA3 workarounds — BUILD_FA=0, --enforce-eager, ROCM_ATTN backend, no FP8 / hipBLASLt — are unchanged.

Commit 3 — Harden the doc with empirical lessons from validating v0.21.0 on real hardware

Four gaps in commit 2 surfaced when actually rebuilding against ROCm 7.2.1 / gfx1100, all now documented:

  1. Default torch nightly index flips rocm6.4rocm7.2. The rocm7.2 index is live; older rocm6.4 wheels embed a HIP ABI that mismatches /opt/rocm 7.2 and produce undefined symbol: _ZN3c10* at runtime when vLLM's compiled extensions load against the wrong libtorch.
  2. Add CMAKE_ARGS="-DHIP_FOUND=TRUE" to the build env. The rocm7.2 nightly torch's bundled LoadHIP.cmake detects HIP correctly but no longer exports the HIP_FOUND global that vLLM's CMakeLists.txt:151 checks. Without the override the build dies with "Can't find CUDA or HIP installation" despite HIP being clearly present in the configure log.
  3. Add a python -c "import vllm._C, vllm._rocm_C" smoke immediately after the build. opt-125m alone is a false-positive sanity check — its code path uses Python fallbacks and will load + generate even with stale/broken .so files. The Llama-architecture model is the first thing that exercises kernels registered by _C, and fails at SiluAndMul.__init__ with AttributeError: '_OpNamespace' '_C' object has no attribute 'silu_and_mul' when the extensions didn't load.
  4. Restructure §8 into two stages (opt-125m for build sanity, Llama-3.1-8B for kernel sanity) and document the harmless first-request Triton JIT-compile warnings (new v0.21.0 jit_monitor feature) and the "Cannot use ROCm custom paged attention kernel, falling back to Triton" line (fallback within ROCM_ATTN, not from it — normal on RDNA3).

Troubleshooting table gains five new rows: HIP_FOUND CMake error, torch-ABI undefined symbol, opt-125m false-pass, jit_monitor warnings, paged-attention Triton fallback.

Test plan

  • npm audit0 vulnerabilities
  • npx tsc --noEmit clean
  • vLLM v0.21.0 builds and runs against ROCm 7.2.1 / gfx1100 following the updated vllm-setup.md
  • make vllm brings up Llama-3.1-8B on :8000; curl /v1/models returns the model id
  • npm start runs the full pipeline: ingests for two tenants, returns grounded answers with parsed citations, cross-tenant probe returns "Not supported by available context."
  • HF transformers swap produces working embeddings — both per-tenant retrievals hit the right chunk
  • Idempotency: points_count stays stable on re-ingest (4 tenant-isolated points after cleanup of 2 pre-migration legacy points)

Operational notes

Post-merge migration. If you have an existing saas_docs Qdrant collection from before this PR, you'll find pre-PR points sitting alongside the new tenant-isolated ones. They're correctly excluded by the tenant filter, but they're noise. Clean with:

curl -s -X POST http://127.0.0.1:6333/collections/saas_docs/points/delete \
  -H 'Content-Type: application/json' \
  -d '{"filter": {"must": [{"is_empty": {"key": "metadata.tenantId"}}]}}'

🤖 Generated with Claude Code

grancier and others added 2 commits May 24, 2026 19:05
Audit before: 27 dependabot alerts (1 critical, 8 high, 17 moderate, 1 low) /
14 npm audit findings across five root-cause packages. After:
0 vulnerabilities reported by either tool.

Resolution path, in order of leverage:

- Drop @langchain/community. It was never imported. Removing it kills
  the entire @langchain/community → ibm-cloud-sdk-core → axios chain (11
  axios CVEs, follow-redirects header leak, and a stagehand-pulled ws
  instance), along with a transitive langsmith copy.

- Replace @xenova/transformers (2.17.2, abandoned at this version) with
  @huggingface/transformers (4.2.0, the maintained successor). xenova
  pinned onnxruntime-web@1.14.0 which pinned onnx-proto@4.0.4 which
  pinned protobufjs@6.11.4 — vulnerable to a CVSS-9.8 RCE plus eight
  other advisories with no upstream fix coming. The HF package ships
  onnxruntime-web@1.26.x and modern protobuf handling, and its
  pipeline() / FeatureExtractionPipeline surface is signature-identical
  to xenova. Migration is a single import line; runtime contract for
  LocalTransformersEmbeddings is unchanged.

- Bump direct uuid to ^13.0.2 to clear the buffer-bounds advisory on
  v3/v5/v6 — we use uuidv5 for both chunk IDs and Qdrant point IDs, so
  this is on the hot path.

- Pin transitive uuid, langsmith, and ws via npm overrides so the
  fixes hold even when @langchain/core or openai resolves to older
  ranges of those packages.

Verification deferred to runtime:
The HF transformers swap changes the ONNX execution path. Embedding
vectors should be byte-equivalent against the same Xenova/all-MiniLM-L6-v2
model but this hasn't been smoke-tested end-to-end. If output shifts,
the existing Qdrant collection's points become unreachable to new
queries — drop and re-ingest after merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Currency update on top of the npm triage. vLLM 0.19.0 is two minor
versions behind upstream (0.21.0 is current) and the documented build
procedure was written around 0.19.0 specifically. v0.20.0 and v0.21.0
introduced three breaking deltas that affect this guide:

- C++20 build requirement (v0.21.0). gcc-10+ required. Ubuntu 22.04
  default gcc-11 and 24.04 default gcc-13 both satisfy this — flagged
  in the prerequisites and the troubleshooting table for older distros.
- PyTorch 2.11 minimum (v0.20.0). The existing nightly index resolves
  to 2.11+ automatically; just a wording change to note the floor.
- HuggingFace transformers v5 required (v0.20.0). Picked up
  transitively by the build; no install-step change.

The RDNA3-specific workarounds (BUILD_FA=0, --enforce-eager,
ROCM_ATTN backend, no FP8, no hipBLASLt, hipBLAS+TunableOp) are all
unchanged — no release-notes mention of gfx1100 CUDA-graph stability
fixes, so --enforce-eager stays on for now.

Also bumps the ROCm minor recommendation from "7.2.x" to "7.2.2",
which v0.21.0 references explicitly (#41386).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@grancier grancier changed the title Triage all 27 dependabot alerts to zero Triage 27 dependabot alerts to zero + bump vLLM build docs to v0.21.0 May 25, 2026
Four findings from validating the v0.21.0 doc bump against an actual
ROCm 7.2.1 / gfx1100 host today, each of which would have saved an
hour or so if the doc had covered it:

- Flip the default torch nightly index from rocm6.4 → rocm7.2. The
  rocm7.2 index is now published; the older rocm6.4 wheels carry a
  HIP ABI that doesn't match /opt/rocm 7.2 and produce
  `undefined symbol: _ZN3c10*` at runtime when vLLM's compiled
  extensions try to load against the mismatched libtorch.

- Add CMAKE_ARGS="-DHIP_FOUND=TRUE" to the build env. The rocm7.2
  nightly torch's bundled LoadHIP.cmake detects HIP correctly and
  prints every ROCm library version but no longer exports the
  HIP_FOUND global variable that vLLM's CMakeLists.txt:151 checks.
  Without the override the build dies with "Can't find CUDA or HIP
  installation" despite HIP being clearly present in the configure
  log.

- Add a `python -c "import vllm._C, vllm._rocm_C"` smoke immediately
  after the build. Necessary because opt-125m's code path uses Python
  fallbacks and will load + generate even with stale/broken .so files
  — making the existing smoke test a false-positive on a real ABI
  problem. The Llama-architecture model is the first thing that
  exercises kernels registered by _C.

- Restructure §8 into two stages: opt-125m for build sanity, then
  Llama-3.1-8B for kernel sanity. Update the "expected log lines"
  prefix to the new v0.21.0 TURBOQUANT-rejection wording, document
  the harmless first-request Triton JIT-compile warnings introduced
  by the new jit_monitor, and flag the "Cannot use ROCm custom paged
  attention kernel, falling back to Triton" line as expected on RDNA3
  (it's a fallback *within* ROCM_ATTN, not a fallback *from* it).

Troubleshooting table gains rows for: the HIP_FOUND CMake error, the
torch-ABI undefined-symbol case, the opt-125m false-pass scenario,
the JIT-monitor warnings, and the paged-attention Triton fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@grancier grancier changed the title Triage 27 dependabot alerts to zero + bump vLLM build docs to v0.21.0 Triage 27 dependabot alerts to zero + harden vLLM 0.21.0 build docs May 25, 2026
@grancier grancier merged commit 65c4473 into master May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant